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ABSTRACT 


Quality of software not only is vital to the success operation of 
the Space Station, it is also an important factor in establishing testing 
requirements, time needed for software verification and integration as 
well as launching schedules for the Space Station. Defense of 
management decisions can be greatly strengthened by combining 
engineering judgements with statistical analysis. Unlike hardware, 
software has the characteristics of no wearout and costly redunduncies, 
thus making traditional statistical analysis not suitable in evaluating 
reliability of software. 

A statistical model was developed to provide a representation of 
the number as well as types of failures occur during software testing 
and verification. From this model, quantitative measure of software 
reliability based on failure history during testing are derived. Criteria 
to terminate testing based on reliability objectives and methods to 
estimate the expected number of fixings required are also presented 
here. 



INTRODUCTION 


The purpose of Multisystem Integration Facility (MSIF) is to 
provide a facility on which information systems for the Space Station, 
which are produced by different developers, may be integrated, tested, 
verified, certified for flight, and packaged for launch. The MSIF concept 
was motivated by the facts that Space Station softwares are being 
developed by multiple developers at different sites. The systems are 
highly distributed and will be built up in phases, over a number of 
launches. Several upgrades and changes will take place over the life of 
the Space Station. MSIF will be required to first perform testing using 
computer models of all the Space Station systems. As real systems are 
delivered at MSIF, testing will be performed using combinations of 
models and real systems. The final test will be one in which all systems 
are actual flight-ready versions. Since the correction of errors found 
during multisystem integration is the responsibility of the developer, 
control over delivered systems may be returned to the developer for the 
correction of errors, and then back to the MSIF to continue testing. 

Software is an important element of the Space Station, and is 
vital to its successful operation. Failure of softwares can be 
life-threatening in some cases. In addition, the quality of software can 
greatly affect the amount of fixings required during the testing, 
integration or verification process, thus making it possible to cause 
delays in launching of the Space Station, which is scheduled to begin in 
January, 1994. Consequently there is an urgent need to search for a 
quantitative measure of the reliability of the software, and to develop 
methods of combining reliability of software and hardware elements of 
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the Space Station to establish the system reliability of the Station. The 
concept of software reliability differs from that of hardware reliability 
in that failure is not due to a "wearing out" process. Software failures 
are in fact errors which, owing to hte complexity of a computer program, 
do not become evident until the combination of conditions bring the error 
to light. Unlike the hardware bathtub curve, there is no wearout 
characteristics, but only a continuing burn-in. Once a software error is 
identified and properly fixed, it is in general, fixed for all time. 

However, the large number of possible paths and its inputs in a space 
station software makes complete testing of the software generally 
impossible. 

Several approaches are currently available for testing of a 
software: path testing, functional testing and formal proofs of 
correctness. A complete functional test would verify that the correct 
output is produced for each input . It would consist of subjecting the 
program to all possible input streams. However, a ten-character string 
has 2 80 possible input streams and corresponding outputs. So complete 
functional testing in this sense is clearly impractical. In path testing, 
one would design a sufficient number of test cases to assume that every 
path through the routine is exercised at least once. But most often, even 
the number of paths through a small routine can be astronomical to 
permit all paths to be tested. As for formal proofs of correctness, each 
program statement is examined and used in a step of an inductive proof 
that the routine will produce the correct output as stated by formal 
mathematics. The practical issue here is that such proofs are very 
expensive and have been applied only to numerical routines. Not only are 
all known approaches to absolute demonstrations of error-free 
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impractical, they are impossible as well. 

Because exhaustively tested and error-free software are not 
made possible by current acceptance procedures, purchaser of a software 
product is provided with no quantitative information on which to base an 
acceptance decision and is thus forced to make these decisions based 
mostly on intuition and his own experience in similiar situation. 

Therefore our goal should be to provide sufficient testing to assure that 
the probability of failure due to hibernating errors is sufficiently low to 
be acceptable. It is expected the level of testing required will depend on 
the system/component, criticality and complexity, state of development 
and cost and usage of the system. 

Software reliability is defined here as the probability that a 
given software operates for some time period without software error 
detectable by executing the codes on the machine for which it was 
designed, assuming that it is used within design limits. Such being the 
case, test cases should be designed to cover the operating scenarios of 
the information system designed. When softwares are delivered to MSIF, 
they have already been successfully tested on the flight-compatible 
hardware. MSIF testing will start using models of other systems, and 
progress to using delivered versions of the other systems. Current 
concepts of MSIF requires if errors are detected, the software be 
returned to its developer with descrepancy reports of the errors found. 
After proper fixing, the software is returned to MSIF for retesting. In 
general, while fixing the errors found, new errors are also introduced. A 
portion of the old errors persisted, and will reoccur during the retesting. 
This process is repeated until a decision is made about the quality of the 
software based on the testing results. In the past it normally means the 
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software operates successfully on all test cases it was subjected to. 

The goals of this paper are: 

1 . Develop a statistical model to describe the failures behaviour 
during testing. 

2. Obtain a statistical measure of software reliability, based on 
failure history observed during testing. 

3. According to prespecified software reliability and failure 
history, establish criteria as to when testing of software can 
stop. 

4. Combining reliability of software and hardware elements of the 
Space Station to establish "system" reliability of the Station. 

5. Specified types of error records to be maintained during testing 
so that they can be used for later statistical analysis. 

These goals are motivated by the fact that in the Space Shuttle 
program, the extent and degree of testings performed on space softwares 
have generally been made based on management judements. Verification 
requirements are to be determined individually for each hardware/ 
software product, based on criticality and risk associated with the 
hardware/software when it is integrated into the operational 
enviroment. 
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DESCRIPTION OF THE MODEL 


The model of interested here is the failure behavior of a 
software after it is delivered to MSIF for testing and integration. A 
number of test cases designed to cover a selection of the enviroment in 
which the software will be used are run and errors occured during 
execution are recorded in descrepancy reports. The software is then 
returned to its developer for fixing. After proper fixing, it is returned to 
MSIF for retesting, where a portion of the old errors may reoccur and 
some new errors are detected. 


Assumptions of the Model 

1 . All errors are caused only by the faults in the software, thus all 
others involved in testing are assumed to possess high fidelity. 

2. All errors occure during testing are observed. 

3. The number of new errors found are statistically independent of 
the total number of errors found during the previous trial. 

4. The failure rate of new errors for each trial is dependent of the 
number of fixings already performed on the software. 

5. The number of test cases run during each trial remains relatively 
constant. 

6. All persisted errors are statistically independent of each other. 

7. The number of new errors observed during each trial follows a 
poisson process. 

Although it is possible that failure rate of new errors found 
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during each trial can be directly proportional, constant, or inversely 
proportional to the number of fixings performed, historical information 
gathered during the development of the Shuttle Orbiter primary flight 
software indicates a decreasing trend. 

During each trial, the total number of errors detected consists of 
two independent entities: new and persisted errors. 

Let 

N k = total number of errors detected during kth trial 
X k = number of new errors detected during kth trial, after (k-1) 
fixings by the developer 

R k = number of persisting errors from the previous trial. 

Thus, the total number of errors detected at each trial is the sum of the 
number of new errors introduced by the last fixing and the number of 
errors persisted from the last trial, i.e. 

N k " x k + R k- 


Analysis of the Model 
Let 

p k * probability of an error found in kth trial to persist in (k+1)th 
trial 

If the number of errors found during kth trial is n k , then the 
probabaility of r errors persist in (k+1 )th trial after fixing is: 
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Prob(R k+1 = r | N k = n k ) = C(n k ,r)p k r (1-p k ) r V r , 


r = 0,1,2, , n k 


where 

C(n k ,r) = the number of unordered samples of size r taken from n k . 

This is the conditional probability density of persisting errors based on 
the total number of errors found in previous trial and it follows a 
binomial distribution B(n k , p k ). As defined in the model, the total 

number of errors in the next trial is determined by the sum of two 
independent random variables, namely the numbers of new and persisted 
errors. This implies future errors is dependent of the number of errors 
found at present through persisted errors. Therefore the conditional 
probability density function of future errors based on present condition 
is: 


Prob(N k+1 = n | N k = n k ) = B(n k , p k )*g(X k+1 ). 


where 

9( x k+l) is the Probability density function of the number of new 
errors found during (k+1 )th trial, and B*g represents the convolution of 
the two random variables, R k+1 and X k+1 . 

By the assumption that X k+1 follows a poisson distribution with mean 

A. k+1 , the convolution of a binomial and poisson distributions is given as 
follows: 
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min(n,n k ) 

Prob(N k+1 - n | N k - n) = X C(n k , j)p k i(1-p k ) n k'i exp(- k k+1 )( ». k+1 ) n -i/j! 

j-0 

If p k is relatively small, then the density function of B(n k , p k ) can be 
approximated with a poisson distribution, and the convolution of R k+1 
and X k+1 is a poisson process given as: 


Prob(N k+ i - n | N k - n k ) - exp(- X k+1 -p k+1 )(X k+1 +P k+1 ) n /n! 


with mean = X k+1 + n k+1 and n k+1 = n R p k . 

In the case where n = 0, this conditional density function 
estimates the software reliability based on number of failures observed. 


Criteria for Termination of Testing : 

Extent of testing for softwares shall be conducted at a level 
consistant with its criticality level associated with the Space Station. 
Softwares that are highly critical to the successful operation of the 
Station will require high reliability, thus more detail testing than the 
others. Current concept documents of MSIF states thaat the degree to 
which a system is tested at MSIF depends on its risk category and how 
tightly coupled it is with the Data Management Systems (DMS). Suppose 
a particular software is required to have a minimum reliability level, 
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say REL, during the mission period T. In particular, REL is defined as the 
probability no error will occur during the mission period, and (1-REL) is 
the probability any error is detected during T. Hence the following 
inequality must be satisfied in order to meet the required reliability 
level REL 


REL > Prob(N k+1 * 0 | N k = n k ) 


Using the approximation of a poisson process, it is thus suffucient to 
solve for n k * such that: 

REL £ exp(- X k+1 -n k p k ) 


or by taking logorithm on both sides, 

n k *<> [ -ln(REL) - X k+1 ] / p k (1) 


Equation (1) gives the set (k, n k ) which yields the required reliability 
level REL. Since n k ^ 0, it implies that the earliest time testing may 
terminate can be obtained by setting n k * « 0 and then solve for a 
smallest k that satisfies: 


X k+1 <-ln(REL). 


Let m be the expected number of fixings required before a software 
passes the testing and 
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m = E[k] 


oo 



k Protesting is terminated after k fixings) 


k = 0 


- I 


k Prob(N k ■ n k , n k >0) 


k = 0 


which can be obtained by applying properties of conditional probabilities 
as follows: 


* * 


Prob(N k = n k , n k >0) 


= X Prob(N k = n k *, n k * > 0, N k _ 1 = n k-1 ) 
n k _i = 0 


oo 

- 2* Prob(N k - n k ‘, n k * £ 0 1 N k . 1 . n k _-, ) Prob(N k _ 1 - n M ) 
n k-1“° 


and 


Prob(Nj - rij) - X Prob(Nj - rij | N H = ry,) ProbfNj^ - rij,-,) , . 
n j-l ”° 

for j * 1 , 2 n k-1 . 
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Example 


Suppose from historical data, one will observe X k new errors 

after k fixings and X k follows a poisson process with mean X k = 

0.5exp(-k) per KSLOC 1 per hour in execution time. Ten percent of the 
errors found during a trial are corrected by the developer after one 
fixing. One hundred test cases which takes a total of 1 00 hours to 
execute are designed to test the Atmosphere Control and Supply (ACS) 
subsystem software, which provides total and partial pressure control 
within the pressurized habitation module in the Space Station. Such 
system consists of 10 KSLOC and is classified as criticality level 1 
which requires a minimum reliability level of 0.999 during its useful life 
cycle of 20 years, i.e. the probability of no error being detected during 
the next 20 years of operation is 0.999. 

By adjusting the unit of measurement, the mean number of 
failures for the ACS software during each trial period which lasts 1 00 
hours is: 

X k = 50exp(-k) 

The number of hours in 20 years - 1 75,200 and is equivalent to 1 ,752 
trial periods. 

To achieve reliability of 0.999 during the next 20 years, the software 
should achieve a minimum reliability level REL* during the test period of 
100 hours where 

(REL*) 1752 » 0.999 

or REL = 0.9999994 

1 KSLOC ■ thousand source line of codes 
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The minimum number of fixings needed for the ACS software based on 
the required relaibility level is given in Table 1 . It shows that for 
required reliability of 0.9999994, the earliest time testing can 
terminate is after the 1 8th fixing when no error is detected during that 
trial. An estimate of the software reliability during the next period 
based on selected number of errors observed during testing are given in 
Table 2. 


Table 1 . The Minimum Number of Fixings Needed for the ACS 
Software Based on Required Reliability Level 


Reliab ilit y Le yeU BED 

0.90 

0.95 

0.99 

0.999 

0.9999 

0.99999 

0.999999 

0.9999997 

0.9999999 


Minimum Number of Fixings (10 
6 
6 
8 
10 
13 
15 

17 

18 
19 
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Table 2. Software Reliability Based on Number of Errors Observed 
Trial (k) # Errors Observed During Testing (N^) Reliability 


1 

0 

0.0011514 

1 

1 

0.0010418 

2 

0 

0.0829635 

2 

1 

0.0750685 

3 

0 

0.4002035 

3 

1 

0.3621191 

4 

0 

0.7139821 

4 

1 

0.6460377 

5 

0 

0.8834349 

5 

1 

0.7993650 

6 

0 

0.9554297 

6 

1 

0.8645085 

7 

0 

0.9833667 

7 

1 

0.8897870 

8 

0 

0.9938485 

8 

1 

0.8992713 

9 

0 

0.9977325 

9 

1 

0.9027858 

9 

2 

0.8168743 

10 

0 

0.9991652 

10 

1 

0.9040821 

10 

2 

0.8180473 

11 

0 

0.9996928 

11 

1 

0.9045595 

11 

2 

0.8184792 

12 

0 

0.9998869 

12 

1 

0.9047351 

12 

2 

0.8186382 

13 

0 

0.9999584 

13 

1 

0.9047998 

13 

2 

0.8186967 

14 

0 

0.9999847 

15 

0 

0.9999943 

16 

0 

0.9999979 

16 

1 

0.9048355 

17 

0 

0.9999992 

18 

0 

0.9999997 

19 

0 

0.9999999 




Certain data concerning the software will have to be collected 
in order to verify the statistical model described here. This includes: 

1 . The frequency of persisted errors from previous trial and new 
errors that are introduced during the fixing effort. 

2. The number of fixings performed by the developer when these 
errors were found. 

3. Criticality level of errors found at each trial. 

4. A function which relates the failure rates of new errors at each 
trial with the number of fixings performed. 

5. Probability distributions of the number of new errors detected at 
each trial. 

6. Probability an identified error is corrected by the developer 
through one fixing. 

7. Number of test cases applied on each trial. 

8. Size of the software. 

The function which relates failure rates of new errors with the 
number of fixings can be obtained by applying regression analysis on the 
frequency of failures obtained through historical data. Since a perfect 
fit of n experimental data may require a polynomial of degree (n-1), 
techniques of selecting a "sufficient" function may be needed to reduce 
this polynomial to an acceptable form. If distributions of failures are 
unknown, a Chi-square goodness-of-fit test can be employed to test for 
pattern of distributions. Data from the Space Shuttle softwares were 
not applied to this model because they made no distinction between 
persisted and new errors. However, the data did support a decreasing 
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trend with the number of fixings. When available, the amount of time and 
cost associated with each fixing can be combined with the amount of 
computer execution times required during testing to establish testing 
schedules and total cost of softwar testing so that launch schedules and 
budgets are met. Factor like degree of interaction with other systems is 
important in determining the reliability of a distributed system, and 
therefore should also be included in the data when available. 


System Reliability 

The Challenger accident shows that in case of accidents, defense 
of management decisions can be greatly strengthened if they are made 
based on combination of statistical analysis and engineering judgements. 

In the context of manned Space Station, it is important to explore 
reliability theory to assess the risk of extended human presence in 
space. Most Space Station systems are complex systems composed of 
hardware and software, both of which are required to be in operational 
states in order for the system to perform its designed function. It is 
thus necessary to include software as part of the components which 
form the reliability network of the Space Station. It has been shown 
that a system with subsystems and components in series will have 
reliability less than that of its weakest link. Suppose the ACS 
subsystem hardware in the habitation module has failure times which 
follows an exponential distribution with MTBF = 1 00 years and 
reliability of its software which interacts with the Data Management 
Systems (DMS) is 0.999 during the next 20 years. Then the ACS 
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subsystem will have reliability less than those of the hardware, 
software and DMS alone. In particular, if no scheduled maintenance work 
are to be performed during the next 20 years and the DMS has reliability 
of 0.995, then the reliability of the ACS subsystem is: 

Reliability - exp(-20/100) x 0.999 x 0.995 
= 0.8138224. 

This demonstrates an important fact that the reliability of a complex 
system decreases rapidly as more subsystems are added to the system 
design. For instance, a system which requires five components in series 
configuration will have reliability of 0.77 if each of the component was 
tested to have reliability of 0.95. Thus in order to achieve the goal of a 
highly reliable system, efforts should be made to obtain highly reliable 
hardware as well as software through either engineering design or 
testing. While traditional methods of redunduncy works well with 
hardware, it can be quite costly for complex software, as redunduncy in 
software normally means an independent development of the computer 
program. It is this unique characteristic of software which makes 
software testing an effective method to maintain software quality. 
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CONCLUSION 


The lack of quantitative method to evaluate reliability of 
software delivered by the developer motivated the statistical model 
designed here. The unique characteristics of no wearout and costly 
redunduncy has made software testing an only way besides software 
design to maintain software quality. The model developed here 
represents the failure pattern during software testing, which includes 
new errors introduced by the fixing and persisted errors from previous 
trial. Quantitative approaches were derived to predict the software 
reliability and criteria to terminate testing based on failure history. 

These results can be applied to enhance the safety of the Space Station 
and to avoid delays in launch schedules due to delay in the software 
verification process. 
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